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ABSTRACT 

We  consider  the  problem  of  opportunistic  dynamic  spectrum 
access  (DSA)  in  an  ad  hoc  network  in  which  unlicensed  sec¬ 
ondary  users  communicate  through  channels  not  used  by  the 
primary  users.  Decentralized  cognitive  medium  access  con¬ 
trol  protocols  are  presented  that  allow  secondary  users  to  rec¬ 
ognize  spectrum  opportunity  and  transmit  based  on  a  partial 
observation  of  the  instantaneous  spectrum  availability.  Under 
a  framework  of  Partially  Observable  Markov  Decision  Pro¬ 
cess  (POMDP),  we  derive  optimal  and  suboptimal  decentral¬ 
ized  strategies  for  the  secondary  users  to  decide  which  chan¬ 
nels)  to  sense  and  access  for  the  maximization  of  the  overall 
network  throughput. 

1.  INTRODUCTION 

Static  spectrum  allocation  strategies  effectively  bypass  the 
problems  of  spectrum  coordination  and,  with  adequate  guard 
bands,  avoid  interference.  Such  fixed  allocations,  however, 
can  be  wasteful  when  the  primary  user  has  no  data  to  transmit 
or  may  result  in  unacceptable  drops  and  delays  if  the  user’s 
demand  is  too  high.  Dynamic  Spectrum  Access  (DSA)  [1] 
represents  a  new  paradigm  of  spectrum  management,  a  shift 
from  static  allocation  to  dynamic  access,  allowing  oppor¬ 
tunistic  communications  based  on  user  demands  and  chan¬ 
nel  availability.  DSA  will  become  increasingly  important  as 
overlay  schemes  and  the  use  of  unlicensed  spectra  increase. 
DSA  is  also  critical  in  coping  with  traffic  load  variations  over 
time  and  space  and  for  heterogeneous  networks  to  coexist 
without  explicit  coordination. 

1.1.  Dynamic  Spectrum  Access 

Two  approaches  to  DSA  have  been  envisioned:  dynamic  spec¬ 
trum  allocation  and  opportunistic  spectrum  access.  While 
sharing  certain  common  features,  these  two  approaches  dif¬ 
fer  in  their  rationale,  technological  challenges,  and  domains 
of  applications.  Dynamic  spectrum  allocation  brought  forth 

Q.  Zhao  is  with  the  Department  of  Electrical  and  Com¬ 
puter  Engineering,  University  of  California,  Davis,  CA  95616, 
qzhao@ece . ucdavis . edu 

L.  Tong  is  with  the  School  of  Electrical  and  Computer  Engineering, 
Cornell  University,  Ithaca,  NY  15853,  ltong@ece .  Cornell .  edu 

A.  Swami  is  with  the  Army  Research  Laboratory,  Adelphi,  MD  20783, 
aswami@arl . army .mil 

This  work  was  supported  in  part  by  the  Multidisciplinary  University 
Research  Initiative  (MURI)  under  the  Office  of  Naval  Research  Contract 
N000 14-00- 1-0564,  and  Army  Research  Laboratory  CTA  on  Communica¬ 
tion  and  Networks  under  Grant  DA  AD  19-0 1-2-001 1. 


by  the  European  DRiVE  project  [2]  mainly  focuses  on  long¬ 
term  commercial  applications  such  as  UMTS  and  DVB-T.  By 
exploiting  temporal  and  spatial  traffic  statistics,  it  aims  to  im¬ 
prove  spectrum  efficiency  through  time-  and  space-dependent 
spectrum  sharing  among  coexisting  radio  services.  For  ex¬ 
ample,  the  amount  of  spectrum  allocated  to  UMTS  and  DVB- 
T  varies  over  region  and  the-time-of-day.  Similar  to  the  cur¬ 
rent  static  spectrum  allotment  policy,  such  DSA  strategies  al¬ 
locate,  at  a  given  time  and  region,  a  portion  of  the  spectrum 
to  a  radio  access  network  for  its  exclusive  use.  As  such,  white 
space  in  spectrum  due  to  bursty  traffic  cannot  be  eliminated. 

Different  from  dynamic  spectrum  allocation  which  uses 
the  statistics  of  spectrum  occupancy,  opportunistic  spectrum 
access  envisioned  by  the  DARPA  XG  program  [3]  aims  to 
exploit  the  instantaneous  spectrum  availability  by  opening 
licensed  spectrum  to  secondary  users.  The  idea  is  to  al¬ 
low  secondary  users  to  identify  available  spectrum  resources 
and  communicate  opportunistically  in  a  manner  that  limits 
the  level  of  interference  perceived  by  primary  users.  It  thus 
has  the  potential  of  eliminating  white  space  in  the  spectrum. 
Such  DSA  strategies  are  more  relevant  to  applications  re¬ 
quiring  rapid  but  short-term  deployment  and  applications  de¬ 
nied  of  cooperation  from  existing  radio  access  networks.  Ex¬ 
amples  include  military  units  penetrating  deep  in  unknown 
and/or  hostile  territories,  wireless  networks  established  for 
particular  social  events,  or  sensor  networks  deployed  for  spe¬ 
cific  tasks.  Requiring  little  cooperation  from  the  spectrum 
licensees,  opportunistic  spectrum  access  can  be  overlayed 
with  the  current  static  allotment  policy  as  well  as  the  envi¬ 
sioned  dynamic  spectrum  allocation.  Besides  software  de¬ 
fined  radio,  the  technological  underpinning  of  opportunistic 
spectrum  access  includes  efficient  spectrum  sensing  for  op¬ 
portunity  identification  and  adaptive  medium  access  and  net¬ 
working  protocols  for  opportunity  utilization. 

1.2.  Decentralized  Cognitive  MAC 

In  this  paper,  we  focus  on  DSA  of  the  second  kind  in  ad 
hoc  networks:  opportunistic  spectrum  access  based  on  in¬ 
stantaneous  network  state.  One  of  the  most  crucial  and  dif¬ 
ficult  challenges  in  such  networks  is  the  design  of  cognitive 
medium  access  control  (MAC)  that  recognizes  and  utilizes 
spectrum  opportunities  for  optimal  network  performance.  For 
ad  hoc  networks  without  a  central  authority,  it  is  desirable 
to  have  a  Decentralized  Cognitive  MAC  (DC-MAC)  where 
each  node  decides  individually  how  to  sense  the  spectrum 
and  how  to  gain  access.  Because  users  cannot  exchange  lo¬ 
cal  information  on  channel  availability  before  agreeing  on  a 
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communication  channel,  such  a  protocol  should  not  rely  on 
cooperation  among  secondary  users.  Thus  the  design  of  DC- 
MAC  for  DSA  networks  is  more  challenging  than  that  for 
standard  ad  hoc  networks. 

We  focus  on  cognitive  MAC  where  each  node  must  sense 
the  channel  intelligently  by  exploiting  statistical  traffic  be¬ 
havior.  We  do  not  assume  that  each  secondary  node  has 
perfect  knowledge  of  the  availability  of  all  channels;  such 
knowledge  implies  a  full  spectrum  sensing  synchronous  among 
users.  We  assume  instead  that  each  node  can  choose  to  sense 
a  subset  of  the  possible  channels  and  must  decide  if  trans¬ 
mission  is  possible  based  on  the  sensing  outcome.  As  such, 
each  node  observes  only  a  partial,  not  the  full,  state  of  the 
network.  Furthermore,  we  allow  sensing  errors:  the  over¬ 
look  of  an  available  channel  and  the  mistake  of  identifying 
an  unavailable  channel  as  an  opportunity. 

The  key  step  in  the  design  of  DSA  is  to  capture  the  dy¬ 
namic  behavior  of  the  spectral  vacancy  which  fundamentally 
determines  the  network  performance.  The  modelling  of  DSA 
dynamics  needs  to  incorporate  channel  availability,  channel 
bandwidth,  and  traffic  pattern.  The  last  plays  a  crucial  role  in 
protocol  design.  For  example,  if  we  know  that  a  primary  user 
favors  a  particular  channel  and  tends  to  occupy  it  for  a  long 
period  of  time,  that  channel  would  be  less  likely  available  for 
a  secondary  user,  and  sensing  it  would  likely  be  a  waste  of 
time  and  energy. 

There  is  also  a  need  to  consider  sensing  and  access  jointly, 
which  leads  to  a  cross-layer  design  for  DSA.  For  example,  if 
a  node  has  no  packet  to  send,  does  it  make  sense  to  perform 
channel  sensing?  Sensing  costs  energy  but  gains  information 
about  the  network  state.  Sensing  proactively  better  prepares 
the  node  for  transmission  but  at  the  cost  of  energy  consump¬ 
tion.  What  would  then  be  the  tradeoff  between  energy  con¬ 
sumption  and  spectrum  utilization? 

1.3.  Scope,  Contribution,  and  Related  Work 

Scope  The  scope  of  this  paper  is  limited  to  the  technical 
(and  more  analytical)  aspects  of  the  MAC  design  for  DSA  ad 
hoc  networks.  Specifically,  we  focus  on  the  theoretical  for¬ 
mulation  and  characterization  of  decentralized  medium  ac¬ 
cess  and  the  development  of  optimal  DC-MAC  protocols. 

While  we  are  not  concerned  about  specific  implementa¬ 
tion  details  (e.g., packet  format  and  bit  definitions),  we  are 
concerned  about  implementation  complexity,  both  in  compu¬ 
tation  and  storage.  We  take  an  analytical  approach,  imposing 
certain  idealized  modelling  assumptions.  It  is  our  hope,  how¬ 
ever,  that  the  results  presented  here  can  provide  insights  into 
the  design  of  more  complicated  DSA  networks  under  more 
realistic  assumptions. 

Contribution  The  contribution  of  this  paper  is  threefold. 
We  provide  (i)  an  analytical  framework  of  DC-MAC  in  DSA 
ad  hoc  networks;  (ii)  a  characterization  of  the  optimal  pro¬ 
tocol;  (iii)  the  development  of  a  low  complexity  suboptimal 


greedy  algorithm. 

Analytical  Framework  We  present  an  analytical  framework 
for  the  design  of  DC-MAC  for  DSA  ad  hoc  networks.  This 
framework  includes  three  components.  Jointly,  they  define 
protocols  that  integrate  channel  sensing  and  access. 

The  first  component  is  a  channel  occupancy  model  that 
captures  the  dynamics  of  channel  availability.  By  using  a 
Markov  chain  formulation,  we  incorporate  traffic  character¬ 
istics  of  the  primary  and  secondary  users.  For  example,  given 
that  a  channel  is  currently  occupied  by  a  primary  user,  the 
probability  that  the  primary  user  will  need  it  for  the  next  slot 
is  modelled  by  the  state  transition  probability. 

The  second  is  the  performance  metric,  the  objective  func¬ 
tion  that  defines  the  optimal  strategy.  In  this  paper,  we  focus 
on  maximizing  spectrum  utilization  by  designing  a  DC-MAC 
that  maximizes  the  average  throughput.  The  formulation, 
however,  can  be  easily  tailored  to  incorporate  energy  con¬ 
sumptions  by  imposing  constraints  or  penalty. 

The  third  component  is  a  decision  theoretic  approach  to 
selecting  which  channel  to  sense  and  access  given  the  node’s 
past  sensing  history,  channel  occupancy  statistics,  and  the 
reward-cost  of  sensing  and  transmission. 

Optimal  DC-MAC  We  present  next  an  optimization  frame¬ 
work  based  on  the  theory  of  Partially  Observable  Markov 
Decision  Process  (POMDP).  The  network  state  is  partially 
observable  due  to  a  more  practical  and  more  general  sensing 
model  that  allows  a  user  to  sense,  not  all,  but  a  subset  of  chan¬ 
nels.  The  structure  of  the  optimal  DC-MAC  is  obtained  fol¬ 
lowing  the  classical  work  of  Smallwood  and  Sondik  [4].  Our 
formulation  is  also  amenable  to  other  POMDP  techniques. 

Suboptimal  Greedy  DC-MAC  The  optimal  DC-MAC  re¬ 
quires  the  update  and  storage  of  an  information  vector  with  a 
dimension  exponentially  growing  with  the  number  of  chan¬ 
nels.  We  show  that  the  required  sufficient  statistic  has  a  di¬ 
mension  linear  in  the  number  of  channels.  This  leads  to  a 
much  simplified  suboptimal  algorithm  based  on  a  greedy  ap¬ 
proach. 

Related  Work  A  majority  of  existing  work  on  DSA  focuses 
on  the  approach  of  dynamic  spectrum  allocation  [2,  5-16], 
The  European  DRiVE  project  [2]  focuses  on  dynamic  spec¬ 
trum  allocation  in  heterogeneous  networks  by  assuming  a 
(logical)  common  coordination  channel.  The  efficiency  of 
DSA  will  depend  upon  the  ability  to  predict  traffic  load  (thus 
spectrum  occupancy).  A  simulation  study  of  the  impact  of 
load  prediction  based  on  load  history  and  simple  regression 
schemes  is  reported  in  [15].  Regulatory  aspects  and  issues 
in  DSA  across  multiple  networks  are  discussed  in  [5],  Two 
centralized  DSA  protocols  that  rely  on  a  super-base  station 
are  described  in  [8]  and  their  performance  evaluated  via  sim¬ 
ulations. 

There  have  been  several  attempts  on  developing  cogni¬ 
tive  MAC  for  opportunistic  spectrum  access  [17-21].  These 


techniques  tackle  the  problem  in  two  separate  steps:  (i)  the 
development  of  an  opportunity  identification  module  using 
classical  detection  and  estimation  techniques  assuming  con¬ 
tinuous  full-spectrum  sensing;  (ii)  the  design  of  an  opportu¬ 
nity  allocation  module  by,  for  example,  graph  coloring  tech¬ 
niques  assuming  full  knowledge  of  spectrum  opportunities. 
Missing  in  this  line  of  approaches  are  two  ingredients:  the 
energy-efficient  sensing  and  the  ability  to  handle  bursty  traf¬ 
fic.  First,  the  assumption  of  continuous  full-spectrum  sens¬ 
ing,  while  simplifying  the  design  of  cognitive  MAC,  is  un¬ 
desirable  and  impractical  due  to  the  energy  consumption  and 
the  hardware  implication.  Second,  traffic  characteristics,  es¬ 
pecially  the  bursty  nature,  should  play  a  crucial  role  in  any 
efficient  cognitive  MAC  scheme.  Why  pay  for  the  knowl¬ 
edge  of  every  opportunity  in  the  whole  spectrum  all  the  time 
when  a  user  only  has  sporadic  needs  for  spectrum  access? 
Here  lies  a  clear  disadvantage  of  existing  approaches  that  de¬ 
couple  opportunity  identification  and  opportunity  allocation. 

2.  PROBLEM  STATEMENT 

Network  and  Channel  Model  Consider  a  spectrum  con¬ 
taining  N  channels1,  each  with  bandwidth  D,  (i  =  1 ,  •  ■  •  ,  N). 
These  N  channels  are  shared  among  primary  users  and  a 
large  number  of  secondary  users  seeking  spectrum  opportu¬ 
nities.  The  traffic  statistics  of  the  primary  users  are  such  that 
these  N  channels  are  synchronous  and  slotted.  We  also  as¬ 
sume  that  the  spectrum  usage  statistics  remain  unchanged  for 
T  slots.  The  energy  and  hardware  constraints  restrict  the  sec¬ 
ondary  users  from  monitoring  more  than  one  channel  within 
one  slot.  Extensions  to  a  more  general  sensing  model  are 
discussed  in  Section  5. 
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Fig.  1:  The  Markov  channel  model 

We  focus  on  a  decentralized  DSA  ad  hoc  network  where 
a  large  number  of  secondary  users  join/exit  the  network  and 
sense/access  the  spectrum  independently  without  exchang¬ 
ing  local  information.  We  assume  that  when  the  network 
reaches  a  steady-state,  each  channel  independently  presents 
itself  as  an  opportunity  to  a  secondary  user  according  to  a 
Markov  process.  As  illustrated  in  Figure  1,  channel  states 
are  represented  by  0  (busy)  and  1  (idle  thus  available  to  the 
secondary  user).  State  transitions  occur  at  the  beginning  of 
each  slot  with  transition  probabilities  given  by  {cti,  Pi}  (i  = 

1  Here  we  use  the  term  channel  broadly.  A  channel  can  be  a  frequency 
band  with  certain  bandwidth.  It  can  also  be  a  collection  of  spreading  codes 
in  a  CDMA  network  or  a  set  of  tones  in  an  OFDM  system. 


1,  •  •  •  ,  N ).  Since  the  unavailability  of  a  channel  may  also  be 
caused  by  channel  fading,  the  Markov  chain  model  can  also 
include  fading  statistics. 

Objectives  We  first  seek  answers  to  a  fundamental  ques¬ 
tion:  what  is  the  optimal  DSA  strategy  that  maximizes  the 
average  number  of  bits  transmitted  by  the  secondary  user  in 
T  slots.  Specifically,  we  seek  the  optimal  DSA  protocol  for 
the  secondary  user  to  determine  in  each  slot  which  channel  to 
monitor  and  subsequently  access  so  that  the  average  number 
of  bits  transmitted  in  T  slots  is  maximized.  We  then  exploit 
the  specific  structure  of  the  problem  in  search  of  simpler  but 
near  optimal  solutions. 

Protocol  Specifics  We  present  here  a  CSMA-based  imple¬ 
mentation.  We  first  discuss  the  basic  structure  of  the  protocol 
and  then  comment  on  several  implementation  issues. 

Protocol  Structure  We  assume  that  channels  are  slotted, 
and  the  slot  timing  is  broadcast2.  The  beginning  of  each  slot 
is  dedicated  for  channel  sensing.  The  primary  users  have  the 
highest  priority,  and  they  sense  the  channel  first  based  on  cer¬ 
tain  priority  related  back-off  scheme.  For  example,  we  can 
impose  a  minimum  value  on  the  backoff  time  of  secondary 
users.  The  primary  user  can  claim  the  slot  before  secondary 
users  start  sensing.  The  choice  of  the  minimum  backoff  time 
for  secondary  users  depends  on  the  propagation  delay  among 
neighboring  nodes  and  how  much  the  network  can  tolerate 
interference  from  secondary  users. 

A  secondary  user  with  data  to  transmit  will  have  to  decide 
which  channel  to  sense.  Such  decisions  are  based  on  its  past 
sensing  history  and  channel  statistics  using  the  optimal  or 
suboptimal  protocols  presented  in  Section  3  and  Section  4. 
If  it  decides  to  sense  a  particular  channel,  it  will  generate  a 
random  backoff,  possibly  a  function  of  its  energy  level  or  its 
channel  state  [22],  and  it  will  transmit  when  the  backoff  timer 
expires  and  no  one  claims  the  channel. 

Transmitter-Receiver  Synchronization  The  transmitter  and 
the  receiver  need  to  tune  to  the  same  channel  in  order  to  com¬ 
municate,  and  they  need  to  hop  synchronously.  The  synchro¬ 
nization  problem  can  be  separated  into  two  phases:  the  initial 
handshake  between  the  transmitter  and  the  receiver  and  the 
synchronous  hopping  in  the  spectrum  after  the  initial  estab¬ 
lishment  of  communication. 

There  are  a  number  of  standard  implementations  to  fa¬ 
cilitate  the  initial  handshake.  Here  we  borrow  the  idea  of 
receiver-oriented  code  assignment  in  CDMA  ad  hoc  networks 
[23],  Specifically,  each  secondary  user  is  assigned  a  set  of 
channels  (not  necessarily  unique)  which  it  monitors  regu¬ 
larly  to  check  whether  it  is  an  intended  receiver.  A  user  with 
a  message  for,  say,  user  A  will  transmit  a  handshake  sig¬ 
nal  over  one  of  the  channels  assigned  to  user  A.  Once  the 
initial  communication  is  established,  the  transmitter  and  the 
receiver  will  implement  the  same  DC-MAC  protocol  which 

2The  slot  information  can  be  broadcast  by  the  primary  users. 


governs  channel  selection  in  each  slot.  In  this  paper,  we  fo¬ 
cus  on  the  design  of  DC-MAC  protocols  assuming  that  the 
initial  handshake  has  been  established. 

Collision  Resolution  and  Avoidance  In  a  network  with  a 
large  number  of  secondary  users  seeking  spectrum  opportu¬ 
nities  independently,  there  needs  to  be  a  mechanism  to  deal 
with  collision.  The  proposed  DC-MAC  schemes  described 
in  Section  3  and  Section  4  make  the  access  decision  based 
on  the  sufficient  statistic  captured  by  the  information  vector, 
which  in  a  way  randomizes  the  choices  of  secondary  users 
and  reduces  the  probability  of  collision. 

Collisions  can  be  further  minimized  by  the  use  of  clas¬ 
sical  random  access  techniques  such  as  CSMA  or  ALOHA. 
Specifically,  a  secondary  user  who  has  identified  an  oppor¬ 
tunity  senses  the  carrier  using  a  random  backoff  time  before 
accessing  the  channel.  While  such  techniques  do  not  solve 
the  problem  of  hidden/exposed  terminals,  our  protocol  can 
be  tailored  to  incorporate  busy-tone  based  techniques. 

We  first  study  the  design  of  DC-MAC  assuming  perfect 
collision  avoidance.  We  then  extend  in  Section  5  the  pro¬ 
posed  protocols  to  incorporate  collision  which  can  be  mod¬ 
elled  as  misidentification  of  spectrum  opportunity. 

3.  OPTIMAL  STRATEGY 

In  this  section,  we  formulate  the  DC-MAC  problem  as  a 
POMDP.  Under  this  decision-theoretic  framework,  we  de¬ 
rive  sufficient  statistics  and  establish  the  optimal  CD-MAC 
protocol. 

3.1.  The  POMDP  Formulation 

The  system  of  N  channels  given  in  Section  2  can  be  mod¬ 
elled  by  a  discrete-time  Markov  chain  with  M  =  2N  states 
where  the  state  is  defined  as  the  availability  of  each  channel. 
The  transition  probability  phj  can  be  readily  obtained  from 
{a,;, /3i}?L1.  The  state  diagram  for  N  =  2  is  illustrated  in 
Figure  2  where  a. *  =  1  —  a,  and  state  (0, 1)  indicates  the  first 
channel  is  busy  whereas  the  second  channel  is  available. 


Fig.  2:  The  underlying  Markov  process  for  N  =  2. 

Since  in  each  slot,  the  user  can  only  select  one  channel  to 
monitor,  the  state  of  the  system  is  only  partially  observable. 


The  problem  of  designing  a  DSA  protocol  that  maximizes 
the  transmission  rate  in  T  slots  can  then  be  formulated  as 
a  POMDP  over  a  finite  horizon.  Specifically,  this  POMDP 
consists  of 

•  Decision  intervals  {1,  •  •  •  ,T}:  slots; 

•  States  S  £  {1,  •  •  •  ,  M}:  availability  of  each  channel; 

•  Transition  probabilities  pj  j :  functions  of  {«*, 

•  Actions  a  £  {1,  •  •  •  ,  N }:  sense  channel  a  and  access 
if  available; 

•  Observation  0J  O  £  {0, 1}:  availability  of  the  chosen 
channel  a  at  state  j; 

•  Reward  waj  =  0Ba:  number  of  transmitted  bits  when 
the  observation  is  9  under  action  a. 

Following  the  illustration  given  in  [4],  we  show  in  Figure  3 
the  sequence  of  operations  in  a  decision  interval.  Specifi¬ 
cally,  at  the  beginning  of  a  decision  interval,  the  system  state 
transits  according  to  pltj.  According  to  a  chosen  action  a 
which  specifies  the  channel  to  be  sensed  in  this  decision  in¬ 
terval  (slot),  the  user  senses  the  channel  and  transmits  if  the 
chosen  channel  is  available.  The  result  of  channel  sensing  is 
given  by  Qj  a  which  indicates  the  availability  of  the  chosen 
channel.  A  reward  wa,e,  determined  by  the  observation  and 
the  action,  is  obtained  at  the  end  of  this  slot. 


State 

Select 

i  Sense  and 

Reward 

Transition  ; 

Action 

;  Access  ; 

Pi, 3 

a 

:  6 .  : 
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n 

n 

^  #  of  remaining 

decision  intervals 

Fig.  3:  The  sequence  of  operations  in  a  slot. 


Under  this  formulation,  the  optimal  DC-MAC  protocol  is 
given  by  the  optimal  policy  (in  terms  of  maximizing  the  ex¬ 
pected  reward  in  T  slots)  of  this  POMDP  over  a  finite  hori¬ 
zon3. 

3.2.  Sufficient  Statistic 

Let  n  (n  =  1,  •  ■  •  ,  T)  denote  the  number  of  remaining  deci¬ 
sion  intervals.  For  a  finite  horizon  POMDP  over  T  slots,  the 
problem  is  to  select  in  slot  T  —  n  +  1  an  action  a  that  will 
optimize  the  system  performance  in  the  remaining  n  decision 
intervals. 

Since  the  underlying  Markov  process  is  only  partially  ob¬ 
servable,  the  internal  state  of  the  system  is  unknown.  Our 

3  The  optimal  DC-MAC  can  also  be  formulated  as  a  POMDP  over  an 
infinite  horizon.  Since  the  statistics  of  channel  occupancy  vary  with  time  due 
to  changes  in  traffic  load,  it  is  more  appropriate  to  consider  the  finite  horizon 
formulation.  It  is,  however,  straightforward  to  extend  our  formulation  to  an 
infinite  horizon  setup  in  which  stationary  policies  can  be  obtained. 


knowledge  of  the  internal  state  of  the  system  based  on  all  the 
past  decisions  and  observations  can  be  encoded  as  an  infor¬ 
mation  vector  7 r  =  [7Ti ,  •  •  ■  ,  7Tm]  where  77  is  the  conditional 
probability  (given  all  the  past  sensing  history)  that  the  state 
of  the  system  is  i  at  the  beginning  of  the  current  decision 
interval  prior  to  the  state  transition. 

It  has  been  shown  in  [4]  that  at  any  time  the  information 
vector  7r  is  a  sufficient  statistic  for  the  optimal  policy.  It  is 
easy  to  see  that  the  dynamic  behavior  of  the  information  vec¬ 
tor  7 r  is  itself  a  discrete-time  continuous-state  Markov  pro¬ 
cess.  Given  the  prior  information  it  on  the  state  of  the  sys¬ 
tem,  our  current  knowledge  n'  of  the  system  after  observing 
9  under  action  a  can  be  easily  obtained  via  the  Bayes’  rule. 

tt'  =  [7Ti ,  ■  -  -  =  T(n\a,Q),  (1) 

_/  _  Efei  *iPi,j  pr[Qj,q  =  0}  m 

3  v-'vM  sr^M  -Q  i /ji  ^ 

Ei= 1  Ej=l  KiPij  Pl  =  0} 

where  T(7r|a,  9)  is  the  updated  information  vector  from  7r 
based  on  observation  9  and  action  a. 

We  have  assumed  that  the  state  transition  probabilities 
{a,:,/3j}  are  known.  In  practice,  this  may  not  be  available. 
The  problem  then  becomes  one  of  POMDP  with  unknown 
transition  probabilities.  Such  formulations  and  algorithms 
exist  in  the  literature  [24]  and  they  are  applicable  to  our  prob¬ 
lem. 

3.3.  Optimal  Strategy 

The  optimal  policy  determines  the  action  in  each  decision  in¬ 
terval  so  that  the  expected  total  reward  is  maximized.  Let 
Vn(n)  denote  the  maximum  expected  reward  that  can  be  ac¬ 
crued  in  the  remaining  n  decision  intervals  when  the  current 
information  vector  is  n.  We  can  obtain  a  recursive  equation 
for  Vn(n)  as 

MM  1 

KW  =  Pl'[0^  = 

’  ’  *=1  j=l  6=0 

(wa,e  +  Vn-\{T {-K\a,  0)))},  (3) 

where  T(ir\a,9)  is  the  a  posterior  information  vector  given 
by  (1). 

It  is  shown  in  [4]  that  Vn(7r)  is  piecewise  linear  and  con¬ 
vex  and  can  thus  be  written  as 

I4(tt)  =  max7T7  k[n)  (4) 

k 

for  some  finite  set  of  M -dimensional  column  vectors  {jk  (n) } 
In  other  words,  the  space  of  information  vectors  can  be  di¬ 
vided  into  a  finite  number  of  convex  regions  separated  by 
hyperplanes.  Within  each  region,  Un(7r)  =  777 k{n)  for  some 
k.  Following  the  example  given  in  [4],  we  illustrate  the  struc¬ 
ture  of  Vn(n)  in  Figure  4.  We  consider  a  three-state  sys¬ 
tem.  An  information  vector  n  is  represented  by  a  point  in 


an  equilateral  triangle.  As  shown  in  Figure  4,  entries  of  7r 
are  given  by  the  distances  to  the  sides  of  the  triangle.  After 
we  observe  9  in  a  decision  interval,  the  information  vector  is 
transformed  into  a  point  in  the  space  of  information  vectors 
for  the  succeeding  decision  interval  (see  (1)).  For  the  exam¬ 
ple  given  in  Figure  4,  the  space  of  information  vectors  when 
there  are  n  —  1  decision  intervals  remaining  is  partitioned 
into  four  regions  with  the  corresponding  7-vectors  given  by 
{71  (n-  !),•■■  ,74(71-  1)}. 


1  1 


The  piecewise  linearity  and  convexity  of  Vn  (tt)  lead  to 
a  linear  programming  procedure  for  calculating  the  optimal 
policy  and  the  corresponding  expected  reward.  Specifically, 
from  (3)  and  (4)  we  obtain 

MM  1 

yn(7r)  =  max  77  ^ PiJ  ^  Pr[0j)O  =  9} 
i= 1  j= 1  0=0 

(waj  +T(7r|a,  9)'yi(7r^,6)(n  -  1))},  (5) 

where  l(Tr,a,0)  denotes  the  corresponding  7- vector  index 
for  the  region  containing  the  transformed  information  vec¬ 
tor  T(n\a,9).  Thus,  if  the  set  of  7-vectors  for  has 

been  calculated,  we  can  obtain  from  (5)  the  optimal  action 
and  the  corresponding  7-vector  for  any  specified  informa¬ 
tion  vector  for  the  n-horizon  case.  A  linear  programming 
algorithm  is  provided  in  [4]  for  computing  the  7-vectors  and 
the  corresponding  mapping  of  these  vectors  onto  the  set  of 
actions.  Thus,  the  optimal  policy  is  given  by  the  partition  of 
the  space  of  the  information  vectors  into  convex  regions,  the 
7-vectors  associated  with  each  convex  region,  and  the  map¬ 
ping  between  the  7-vectors  and  the  optimal  actions.  Note 
that  the  computation  of  the  partition,  the  7-vectors,  and  the 
mapping  can  be  done  off-line  and  the  result  stored  in  a  table. 

4.  SUFFICIENT  STATISTIC  WITH  REDUCED 
DIMENSION 

We  exploit  the  structure  of  the  underlying  Markov  process 
to  reduce  the  dimension  of  the  sufficient  statistic  from  expo¬ 
nential  to  linear  (in  N).  Based  on  this  sufficient  statistic  with 
reduced  dimension,  we  derive  a  suboptimal  greedy  algorithm 
that  maximizes  per-slot  throughput. 


4.1.  Reduced-State  POMDP 

As  stated  in  Section  3,  the  2Ar-dimensional  information  vec¬ 
tor  7r  is  a  sufficient  statistic.  We  show  in  this  section  that 
by  exploiting  the  specific  structure  of  the  underlying  Markov 
process,  we  can  obtain  a  sufficient  statistic  with  dimension 
reduced  from  exponential  to  linear  with  respect  to  N,  i.e., 
from  M  =  2n  to  N. 

Proposition  1  Let  A  =  [Ai,  •  •  •  ,  A;y]  where  A,  is  the  proba¬ 
bility  that  channel  i  is  available  to  the  secondary  user.  Then 
at  any  time,  A  is  a  sufficient  statistic  for  the  above  specified 
DSA  system. 

Proof:  To  prove  A  is  a  sufficient  statistic,  we  need  to  show 
that  (i)  A  summarizes  all  the  information  on  the  availability 
of  each  channel  obtained  from  the  history  of  observations; 
(ii)  the  maximum  expected  reward  Vn(-)  is  completely  deter¬ 
mined  by  A. 

To  show  (i),  we  define  I  it)  as  the  total  available  infor¬ 
mation  about  the  process  at  the  end  of  decision  interval  t. 
Please  note  that  the  time  variable  t  increases  with  time  while 
the  time  variable  n  used  in  the  rest  of  the  paper  denotes  the 
number  of  remaining  decision  intervals,  thus  decreasing  with 
time.  Since  the  only  information  obtained  during  decision  in¬ 
terval  t  is  our  observation  0o(f)  under  action  a(t),  we  have 

I(t)  =  {a(t),eo(t),I(t-l)}.  (6) 

Let  Si  ( t )  denote  the  state  of  channel  i  in  slot  t.  We  then  have 

Pr[channel  i  is  available  in  slot  t\l(t)\ 

=  Pr[Si(t)  =  l|X(i)] 

=  Pr[Sj(f)  =  l|a(f),  0a(f),2(f  -  1)] 

1  ifa(f)  =i,0a(f)  =  1 

0  _  if  a(f)  =  i,  ©a(f)  =  0(7) 

Ai(f  -  l)0i  +  \{t  -  l)«i  if  a{t)^i 

where  Aj(f  —  1)  =  1  —  A  fit  —  1),  and  (7)  follows  from  the 
independent  Markov  model  on  the  dynamics  of  the  channels. 
From  (7)  we  see  that  the  calculation  of  A  it)  based  on  the 
whole  history  of  observations  requires  only  A(f  —  1)  and 
the  newly  obtained  information  in  decision  interval  t.  Thus, 
Ait  —  1)  summarizes  all  the  information  on  channel  availabil¬ 
ity  gained  prior  to  decision  interval  t  and  represents  a  suffi¬ 
cient  statistic  for  the  past  sequence  of  observations  Tit  —  1). 

We  now  prove  (ii)  by  induction.  For  n  =  1,  we  have 

Vi  (A)  =  max  (Aa/30  +  (1  -  Aa)aa)5a.  (8) 

a—  ,N 


Clearly,  V±  (•)  is  completely  determined  by  A.  Assume  Vn-\  (•) 
is  determined  by  A.  It  then  follows  that 

Vtx(A)  —  max  •[ (Aa/?a -|-  \aot.a)Ba 

a—  !,•••  ,N 
l 

+  ]T  Pr[0a  =  0|A,  a]V^_i(T(A|a,  9))} 

9= 0 

=  max  {(A af3a  +  Aaaa)B0 

a=l,---  ,N 

+(A  afia  +  A0aa)V„_i(T  (A|a,  0)) 

+(Aa/3a  +  A0aa)V„_i(T(A|a,l))}  (9) 

where  T (A|a,  9)  denotes  the  updated  information  on  channel 
availability  given  the  observation  9  under  action  a.  From  (7) 
we  know  that  T( A|a,  9)  is  completely  determined  by  A  for 
given  9  and  a.  We  then  conclude  from  (9)  that  A  presents  a 
sufficient  statistic  for  calculating  Vn(-). 

□  □□ 

Proposition  1  shows  that  by  exploiting  the  statistical  in¬ 
dependence  among  channels,  we  can  reduce  the  dimension 
of  the  sufficient  statistic  from  2N  to  N.  The  optimal  policy 
can  thus  be  obtained  from  the  space  of  information  vectors  A 
whose  dimension  increases  linearly  instead  of  exponentially 
with  the  number  of  channels  (see  (9))  for  the  recursive  equa¬ 
tion  on  Vn(A)).  This  result  has  the  potential  of  significantly 
reducing  the  computational  complexity  and  memory  require¬ 
ment  as  demonstrated  in  the  greedy  approach  presented  next. 

4.2.  A  Greedy  Approach 

Searching  for  the  optimal  policy  can  be  computationally  in¬ 
tense,  especially  when  T  and  N  are  large.  In  this  section, 
we  propose  a  suboptimal  protocol  based  on  a  greedy  ap¬ 
proach  that  maximizes  per-slot  throughput.  Since  A  is  a  suf¬ 
ficient  statistic,  the  optimal  action  a*  that  maximizes  per-slot 
throughput  is  completely  determined  by  A.  Specifically, 

a*  =  arg  max  (A0/3a  +  A  aaa)Ba.  (10) 

a= I,-  -  ,N 

Let  Wn  (A)  denote  the  expected  throughput  in  the  remaining 
n  slots  achieved  by  the  greedy  approach.  We  obtain  a  recur¬ 
sive  equation  for  Wn  (A)  as 

Wni A)  =  max  {(Aa/?0  +  Aaaa)BQ} 

a—  I,---  ,N 
1 

+  ^Pr[0a,  =  0|A,a,]K-i(T(A|a*,0)) 
e=o 

=  max  {(Aa/3a  +  Xaaa)Ba} 

a—  I,---  ,N 

+(A  a, Pa,  +  Aa,>aai>)Vrn_i(T(A|a*,  0)) 
+(A0./?a.  +A0.a„.)Vn_1(T(A|a„l))Jai) 

where  a*  is  given  by  (10),  and  T (A|a,  9)  denotes  the  updated 
information  on  channel  availability  given  the  observation  9 
under  action  a.  The  calculation  of  T (A|a,  9)  is  given  in  (7). 


5.  VARIATIONS  AND  EXTENSIONS 

In  this  section,  we  discuss  several  variations  and  extensions 
of  the  proposed  analytical  framework  and  optimal/suboptimal 
DC-MAC  protocols.  We  focus  on  the  variations  of  the  greedy 
approach;  extensions  to  the  optimal  strategy  follows  directly. 

5.1.  Overlook  and  Misidentification  of  Opportunities 

The  Markov  model  for  the  channel  occupancy  allows  easy 
incorporation  of  overlook  and  misidentification  of  spectrum 
opportunity.  Consider  first  the  overlook  of  opportunity.  Let 
e-i  (i  =  1,  ■  ■  ■  ,  N )  denote  the  probability  that  an  idle  chan¬ 
nel  i  is  mis-sensed  as  busy.  The  channel  a*  selected  by  the 
greedy  approach  is  thus  given  by 


(a  transmitting  node  within  the  range  of  the  transmitter  but 
not  the  receiver)  causes  an  opportunity  overlook.  The  prob¬ 
lem  of  hidden  and  exposed  terminals  can  be  handled  in  the 
same  manner  as  sensing  errors  discussed  above.  Note  that 
while  opportunity  overlook  and  misidentification  degrade  the 
throughput  performance,  they  do  not  affect  the  synchroniza¬ 
tion  between  the  transmitter  and  the  receiver.  By  check¬ 
ing  whether  a  packet  has  been  successfully  received,  the  re¬ 
ceiver  is  aware  of  the  sensing  results  at  the  transmitter  and 
can  thus  maintain  the  same  update  of  the  information  vector 
as  the  transmitter.  Similarly,  the  acknowledgement  scheme 
discussed  above  ensures  that  both  the  transmitter  and  the  re¬ 
ceiver  can  incorporate  misidentification  into  the  update  of  the 
information  vector. 


a*  =  arg  max  (A a(3a  +  Aaaa)(l  -  ea)Ba.  (12) 

a=l,---  ,N 


5.2.  More  General  Sensing/Access  Models 


A  modification  of  (7)  leads  to  the  update  of  the  information 
vector. 


%{ A|a,0) 


1 

_ (Ai/3t+AjCt»)e» _ 

A iPi  T  A iOti 


if  a  =  i,6  =  1 


if  a  =  i,  6  =  0 


if  a  ^  i 


(13) 


where  7)(A|a,  9)  denotes  the  ?'th  entry  of  the  transformed  in¬ 
formation  vector.  A  recursive  equation  for  IT,, (A)  can  be 
readily  obtained  by  modifying  Pr[0a>  =  9]  in  (11). 

Consider  next  the  scenario  where  with  probability  Si,  the 
secondary  user  mistakes  a  busy  channel  i  as  an  idle  one. 
When  a  misidentification  occurs,  the  user  will  transmit,  lead¬ 
ing  to  a  collision.  We  assume  instantaneous  and  error-free 
acknowledgement  from  the  receiver  to  the  secondary  user  af¬ 
ter  a  successful  transmission.  The  secondary  user  can  thus 
identify  the  event  of  collision  and  use  this  information  to  en¬ 
sure  correct  update  of  the  information  vector.  Specifically, 
let  AT  €  {0, 1}  denote  whether  the  secondary  user  receives 
the  acknowledgement  (K  =  1  indicates  the  reception  of  ac¬ 
knowledgement),  we  have 


It  is  straightforward  to  extend  the  proposed  DC-MAC  frame¬ 
work  and  protocols  to  accommodate  more  general  hardware 
models.  Specifically,  the  user  can  sense  up  to  L  \  channels 
and  access  up  to  L2  channels  simultaneously.  The  extension 
of  the  greedy  approach  to  this  general  case  is  straightforward. 
The  optimal  strategy  also  follows  directly  after  modifying  the 


action  space  to  include  all  ^  J  possibilities  of  channel  se¬ 
lection.  It  is  obvious  that  without  considering  cost  in  sens¬ 
ing  and  transmission,  actions  that  select  less  than  L\  chan¬ 
nels  should  not  be  considered,  and  the  channels  to  access  are 
those  L2  idle  channels  with  the  largest  bandwidth. 


6.  NUMERICAL  AND  SIMULATION  RESULTS 

We  present  in  this  section  numerical  and  simulation  exam¬ 
ples  on  the  performance  of  the  proposed  DC-MAC  proto¬ 
cols.  We  focus  on  the  performance  comparison  between  the 
optimal  and  suboptimal  greedy  approaches  and  the  effect  of 
opportunity  overlook  on  the  performance  of  DC-MAC. 


Tt(A\a,9,K) 


1  if  a  =  i,9  =  1,  K  =  1 

0  ifa  =  i,9  =  l,K  =  Q 

0  if  a  =  i,  9  =  0 

A i(3i  -j-  A iOCi  if  CL  ^  i 


The  channel  a*  selected  by  the  greedy  approach  in  this  case 
is  given  by  (10),  and  a  recursive  equation  for  Wn(A)  can  be 
obtained  by  modifying  (11).  The  above  two  scenarios  can  be 
easily  combined  to  model  both  overlook  and  misidentifica¬ 
tion  of  spectrum  opportunity. 

Besides  sensing  error,  hidden  and  exposed  terminals  can 
also  lead  to  overlook  and  misidentification  of  spectrum  op¬ 
portunities.  Specifically,  a  hidden  terminal  (a  transmitting 
node  within  the  range  of  the  receiver  but  not  the  transmit¬ 
ter)  results  in  a  misidentification  while  an  exposed  terminal 


6.1.  Optimal  vs.  Suboptimal  Approaches 

In  Figure  5  and  Figure  6  we  compare  the  transmission  rate 
(in  bits/slot)  achieved  by  the  optimal  and  the  suboptimal  pro¬ 
tocols  proposed  in  this  paper.  As  shown  in  Figure  5,  the 
transmission  rate  achieved  by  the  greedy  approach  matches 
that  of  the  optimal  scheme  in  this  particular  setup.  For  the 
three-channel  scenario  considered  in  Figure  6,  the  perfor¬ 
mance  loss  of  the  greedy  approach  is  within  4%.  These  ex¬ 
amples  demonstrate  the  near-optimal  performance  achieved 
by  the  greedy  approach  at  a  much  lower  complexity.  We 
point  out  that  the  transmission  rate  increases  over  time.  This 
is  due  to  the  improved  information  about  the  state  of  the  sys¬ 
tem  drawn  from  accumulating  observations,  demonstrating 
the  cognitive  nature  of  the  proposed  protocols. 


Fig.  5  :  Transmission  rate  of  the  optimal  and  greedy 
strategies  (N  =  2,  Bi  =  1,  B2  =  2,  «i  =  0.44, 

/3i  =  0.23,  Q2  =  0.28,  02  =  0.12,  the  initial  information 
vector  is  set  to  the  stationary  distribution). 


Fig.  6:  Transmission  rate  of  the  optimal  and  greedy 

strategies  (N  =  3,  B  =  [0.9, 1, 0.8],  a  =  [0.1, 0.5, 0.8], 
/3  =  [0.5,  0.4,  0.3],  the  initial  information  vector  is  set  to 
the  stationary  distribution). 


6.2.  Robustness  to  Opportunity  Overlook 

In  this  simulation  example,  we  study  the  performance  of  the 
suboptimal  greedy  approach  in  the  presence  of  opportunity 
overlook.  As  given  in  (13),  incorporating  opportunity  over¬ 
look  into  the  update  of  the  information  vector  requires  the 
knowledge  of  the  overlook  probabilities  {ei}^L1.  We  are 
particularly  interested  in  the  performance  of  the  greedy  al¬ 
gorithm  in  the  presence  of  overlook  without  the  knowledge 
of  the  overlook  probabilities.  Shown  in  Figure  7  is  the  simu¬ 
lation  result  where  we  study  the  throughput  of  the  greedy  ap¬ 
proach  as  a  function  of  the  overlook  probability  =  ■  ■  •  = 
6n  =  e.  We  consider  here  three  cases:  the  greedy  approach 
in  the  absence  of  overlook,  the  greedy  approach  in  the  pres¬ 
ence  of  overlook  with  and  without  the  knowledge  of  the  over¬ 
look  probability  e.  Without  the  knowledge  of  e,  the  informa¬ 


tion  vector  is  updated  according  to  (7)  as  if  e  =  0.  From 
Figure  7  we  can  see  that  overlook  results  in  linear  degrada¬ 
tion  in  throughput  since  only  a  fraction  of  1  —  e  of  spectrum 
opportunities  are  utilized.  The  performance  of  the  greedy 
approach  is,  however,  robust  to  the  lack  of  knowledge  on  the 
overlook  probability.  Without  the  knowledge  of  e  to  perform 
the  correct  update  of  the  information  vector,  the  performance 
loss  is  marginal. 
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Fig.  7:  Transmission  rate  of  the  optimal  and  greedy 

strategies  (N  =  3,  B  =  [0.9, 1,  0.8],  a  =  [0.4,  0.6,  0.8], 
/ 3  =  [0.9,  0.7,  0.5],  T  =  30,  the  initial  information  vector 
is  set  to  the  stationary  distribution). 


7.  CONCLUSION 

We  have  presented  an  optimization  framework  for  decentral¬ 
ized  cognitive  MAC  for  opportunistic  spectrum  access.  The 
protocol  is  based  on  carrier  sensing  that  allows  proper  as¬ 
signment  of  priorities  among  primary  and  secondary  users. 
The  POMDP  formulation  allows  us  to  derive  optimal  and 
low  complexity  suboptimal  protocols  that  maximize  overall 
network  throughput.  Numerical  evaluation  and  simulations 
indicate  that  the  suboptimal  algorithm  provides  near  optimal 
performance. 

The  proposed  approach  can  be  extended  in  a  number  of 
areas.  Existing  carrier  sensing  protocols  for  ad  hoc  networks 
can  be  easily  incorporated  into  our  framework.  Opportunistic 
communication  techniques  based  on  channel  realizations  are 
also  compatible  with  our  framework.  Of  particular  interest 
are  POMDP  techniques  that  do  not  assume  a  priori  knowl¬ 
edge  of  transition  probabilities. 
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